# Visualization and Analysis
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(ggdendro)
library(readxl)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
suppressWarnings({
library(readxl)
})
# Modeling and Inference
library(TSA)
##
## Attaching package: 'TSA'
## The following objects are masked from 'package:stats':
##
## acf, arima
## The following object is masked from 'package:utils':
##
## tar
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
## method from
## fitted.Arima TSA
## plot.Arima TSA
library(astsa)
##
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
##
## gas
# Importind Data.
# NOTE: Please store the oil.csv file in the directory of this file.
df = read.csv('oil.csv')
summary(df)
## date dcoilwtico
## Length:1218 Min. : 26.19
## Class :character 1st Qu.: 46.41
## Mode :character Median : 53.19
## Mean : 67.71
## 3rd Qu.: 95.66
## Max. :110.62
## NA's :43
From the above summary we can observe that there are 1218 datapoints in out dataset and 43 missing values. We will address the missing values later on.
plot1 <- df |>
plot_ly(type="scatter", mode="lines") |>
add_trace(x = ~date, y = ~dcoilwtico, name="Daily Sale") |>
layout(showlegend=FALSE, plot_bgcolor = "white")
options(warn=-1)
plot1
Based on the analysis of the plot we have decided to impute missing value with the next value observed in the time series.
df_new <- df %>% fill(dcoilwtico, .direction="up")
summary(df_new)
## date dcoilwtico
## Length:1218 Min. : 26.19
## Class :character 1st Qu.: 46.42
## Mode :character Median : 53.19
## Mean : 67.67
## 3rd Qu.: 95.59
## Max. :110.62
We now plot the imputed dataset.
plot2 <- df_new |>
plot_ly(type="scatter", mode="lines") |>
add_trace(x = ~date, y = ~dcoilwtico, name="Daily Sale") |>
layout(showlegend=FALSE, plot_bgcolor = "white")
options(warn=-1)
plot2
Upon plotting the imputed dataset we can observe that there is no trend or seasonality observed over time. This is a very good instance to test Exponential Smoothing methods.
As it can be seen in the above code piece the data has no seasonality as the time series in not decomposible.
Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight.
The simplest of exponentially smoothing methods is Simple Exponential Smoothing (SES). This method is suitable for forecasting data with no clear trend or seasonal pattern. For simple exponential smoothing, the only component included is the level as observed in the below forecasting and smoothing equation. The smoothing equation only uses alpha to include the trend component of Time Series.
Holt (1957) and Winters (1960) extended Holt's method to capture seasonality. The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations — one for the level \(l_t\) one for the trend \(b_t\), and one for the seasonal component \(s_t\) ,with corresponding smoothing parameters \(\alpha\), \(\beta^*\), and \(\gamma\) respectively.
As the data doesn’t have seasonality in it (evident from the decomposition of data) we can safely say that Holt-Winters method is of lesser use here. So we are going to compare 2 models:
modelARIMA <- auto.arima(df_new['dcoilwtico'])
summary(modelARIMA)
## Series: df_new["dcoilwtico"]
## ARIMA(0,1,0)
##
## sigma^2 = 1.449: log likelihood = -1952.37
## AIC=3906.73 AICc=3906.73 BIC=3911.83
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.03759184 1.203095 0.8949041 -0.08030733 1.547907 0.9992644
## ACF1
## Training set -0.04614975
As observed, the best fit ARIMA model is ARIMA(p=0, d=1, q=0) with AICc3906.73 and RMSE 1.203095.
fcARIMA <- forecast(modelARIMA, h=100)
autoplot(fcARIMA)
dfETS <- as.ts(df_new['dcoilwtico'])
modelETS <- ets(dfETS)
summary(modelETS)
## ETS(A,N,N)
##
## Call:
## ets(y = dfETS)
##
## Smoothing parameters:
## alpha = 0.954
##
## Initial states:
## l = 93.1356
##
## sigma: 1.2028
##
## AIC AICc BIC
## 9107.716 9107.735 9123.031
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.03953028 1.20184 0.8965985 -0.08424058 1.551493 1.001156
## ACF1
## Training set -0.0008160611
The best fitted Exponential Smoothing model has alpha 0.954, and its has AICc of 9107.735 and RMSE score is 1.20184 which is better than the performance of best fitted ARIMA model.
fcETS <- forecast(modelETS, h=100)
autoplot(fcETS)
Upon training the dataset on ARIMA and ETS models we observe that they yield similar RMSE scores. This metric is further supported in the behavior of the 100 days forecast plots.
However, the RMSE score of the Exponential Smoothing (1.20184) is better than ARIMA model (1.203095) so, for our data the the Exponential Smoothing model (with alpha=0.954, beta=0, and gamma=0) is a better model than ARIMA (p=0, d=1, q=0).
** We used the exact text from this book to maintain the accuracy of theoretical information provided.